{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# hvPlot.kde\n",
    "\n",
    "```{eval-rst}\n",
    ".. currentmodule:: hvplot\n",
    "\n",
    ".. automethod:: hvPlot.kde\n",
    "```\n",
    "\n",
    "## Backend-specific styling options\n",
    "\n",
    "```{eval-rst}\n",
    ".. backend-styling-options:: kde\n",
    "```\n",
    "\n",
    "## Examples\n",
    "\n",
    "### Basic KDE\n",
    "\n",
    "This example shows a KDE plot built from a sample of a Weibull distribution using `kde` with its default parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({'values': np.random.weibull(5, size=1000)})\n",
    "\n",
    "df.hvplot.kde()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualise the KDE of a dataset containaing the depth of earthquakes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.earthquakes(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(y='depth')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Control smoothing with `bandwidth`\n",
    "\n",
    "You can control the smoothness of the estimate using the `bandwidth` argument that accepts a positive numerical value. Smaller values yield more detail. When not set, the bandwidth is internally computed using Scott's rule of thumb."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.earthquakes(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(\n",
    "    y='depth', bandwidth=0.1,\n",
    "    width=300, title='bandwidth=0.1'\n",
    ") +\\\n",
    "df.hvplot.kde(\n",
    "    y='depth', bandwidth=0.5,\n",
    "    width=300, shared_axes=False, title='bandwidth=0.5'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Control evaluation extent with `cut`\n",
    "\n",
    "`cut` is a factor, multiplied by the smoothing `bandwidth`, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, the curve is truncated at the data limits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.earthquakes(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(y='depth', width=300, title='default') +\\\n",
    "df.hvplot.kde(y='depth', cut=0, width=300, title='cut=0')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### KDE from wide-form data\n",
    "\n",
    "When setting `y` to a list of variables, the object returned is an overlay of the distribution of each variable ([HoloViews NdOverlay](https://holoviews.org/reference/containers/bokeh/NdOverlay.html) object). This example uses multiple numerical columns from the penguins dataset to compare their distributions using a kernel density estimate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(\n",
    "    y=[\"bill_length_mm\", \"bill_depth_mm\"], color=[\"orange\", \"green\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting `subplots` to `True`, the object returned is a layout ([HoloViews NdOverlay](https://holoviews.org/reference/containers/bokeh/NdLayout.html) object)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(\n",
    "    y=[\"bill_length_mm\", \"bill_depth_mm\"],\n",
    "    width=300, subplots=True, shared_axes=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### KDE from long-form data\n",
    "\n",
    "`by` can also be used to generate an overlay or distribution of histograms, by setting it with categorical variable(s). This example shows how to use the `by` keyword to compare the distribution of bill lengths across penguin species."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(y=\"bill_length_mm\", by=\"species\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.kde(y=\"bill_length_mm\", by=[\"species\", \"sex\"], subplots=True, width=300).cols(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Xarray example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.xarray  # noqa\n",
    "\n",
    "ds = hvplot.sampledata.air_temperature(\"xarray\").sel(lat=[25, 50, 75])\n",
    "\n",
    "ds.hvplot.kde(\"air\", by=\"lat\", alpha=0.5)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
